![]() |
The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson Wiley Computer Publishing, John Wiley & Sons, Inc. ISBN: 0471353817 Pub Date: 03/01/99 |
Previous | Table of Contents | Next |
Difference sequences in A and B translate into difference sequences in (K2i, K2i+1). However, while it is natural to consider A and B difference sequences in terms of XOR differences, subkeys can reasonably be considered either as XOR differences or as differences modulo 232. Thus, we may discuss difference sequences:
D[i,M, M*] = Ki,M - Ki,M*
X[i,M, M*] = Ki,M ⊕ Ki,M*
where the difference is computed between the key value M and M*.
XOR Differences in the Subkeys. Each round, the subkeys are added to the results of the PHT of two g functions, and the results of those additions are XORed into half of the cipher block. An XOR difference in the subkeys has a fairly high probability of passing through the addition operation and ending up in the cipher block. (The probability of this is determined by the Hamming weight of the XOR difference, not counting the highest-order bit.) However, to get into the subkeys, an XOR difference must first pass through the first addition.
Consider
x + y = z
(x ⊕ δ0)+ y = z ⊕ δ1
Let k be the number of bits set in δ0 , not counting the highest-order bit. Then, the highest probability value for δ1 is δ0, and the probability that this will hold is 2-k. This is true because addition and XOR are very closely related operations. The only difference between the two is the carry between bit positions. If flipping a given bit changes the carry into the next bit position, this alters the output XOR difference. This happens with probability 1/2 per bit. The situation is more complex for multiple adjacent bits, but the general rule still holds: for every bit in the XOR difference not in the high-order bit position, the probability that the difference will pass through correctly is cut in half.
For the subkey generation, consider an XOR difference, δ0, in A. This affects two subkey words:
K2i = Ai + Bi
K2i+1 = ROL(Ai + 2Bi, 9)
where the additions are modulo 232. If we assume these XOR differences propagate independently in the two subkeys (which appears to be the case), we see that this leads to an XOR difference of δ0 in the even subkey word with probability 2-k, and the XOR difference ROL(δ0, 9) in the odd subkey with the same probability. The most probable XOR difference in the rounds subkey block thus occurs with probability 2-2k. A desired XOR difference sequence for all 20 pairs of subkey words is thus quite difficult to get to work when k ≥ 3, assuming the desired XOR difference sequence can be created in the A sequence at all.
When the XOR difference is in B, the result is slightly more complicated; the most probable XOR difference in a rounds pair of subkey words may be either 2-(2k-1) or 2-2k, depending on whether or not the XOR difference in B covers the next-to-highest-order bit.
Additive Differences in the Subkeys. An XOR difference in A or B is easy to analyze in terms of additive differences modulo 232: a XOR difference with k active bits has 2k equally likely additive differences. Note that if we have an additive difference in A, we get it in both subkey words, just rotated left nine bits in the odd subkey word. Thus, k-bit XOR differences lead to a given additive difference in a pair of subkey words with probability 2-2k. (The rotation does not really complicate things much for the attacker, who knows where the changed bits are.)
The round function F takes a 64-bit input and produces a 64-bit output; and F is characterized by the four S-boxes si and the round keys K2r+8, K2r+9. The S-boxes depend on N /2 bits of key material S. We show that all the F functions generated in this way are distinct. Our test for distinctness of the round functions concentrates on only a single bit of the output, which will be sufficient to prove uniqueness.
In particular, consider the value
F1 = (T0 + 2T1 + K2r+9) mod 232
where T0 =g(R0 and T1 =g(ROL(R1, 8)). Because the g function involves an MDS matrix multiply, which uses the XOR operation, it is difficult to analyze the full F1 value due to the interaction between operations of different algebraic groups (i.e., XOR and addition). However, if we examine only the least significant bit (1sb) of F1, we can ignore carries, so the operation can be analyzed entirely Using XOR. Note that this bit does not depend on R1, due to the multiplication by 2 in the PHT. Further, the MDS matrix element that maps the S-box output is a simple linear transformation (i.e., multiplication by a GF(28) field element). Thus, for each S-box, the effect of the final fixed permutation (q0 or q1) of the S-box and the MDS multiply on the lsb of F1 is a simple fixed mapping from eight bits to one bit.
Now, consider two keys with distinct values for S. Since the S values are different, at least one of the four S-boxes must have different key material under the two different keys. To prove uniqueness, we simply fix the inputs to the remaining three S-boxes, so that their XOR contribution to the lsb of F1 is fixed. Up to a fixed XOR constant (based on K2r+9 and the other three S-boxes), we are then left with a simple function involving a single S-box that maps eight bits to one bit, with N/8 bits of key material used in the S-box. We can remove dependence on the fixed constant by constructing sequences consisting of the XOR of two bits in the S-box/MDS lsb output sequence, similar to the method discussed in section 8.1.2. If this sequence of bits is unique across the all 2N/8 possible key material values for the S-box, then the F function is unique with respect to that S-box. If all four S-boxes are unique in this way, then the F function is unique for each distinct value of S.
To remove the dependence on the constant bit, we performed our search on a modified sequence in which each bit was the XOR of two lsbs of the S-box/MDS output values. Since the si mapping has 256 inputs, this limits the sequence down to only 255 values, which is still easily sufficient to distinguish between F functions. Let us first consider the 256-bit key case, which obviously involves the longest search. Note, for example,
s0(x) = q1[q0[q0[q1[q1[x] ⊕ k3] ⊕ k2] ⊕ k1] ⊕ k0]
where the kj bytes are a subset of the bytes in S. There are 232 such functions, but, unfortunately, since we are dealing only with the lsb, there is no obvious method to cut down the search time by a factor of 256, as we were able to do in section 8.1.2. Similar sequences were produced for the smaller key sizes, with all sequences for each S-box combined across key sizes in the test, for a total of NF = 232 + 224 + 216 sequences per S-box.
To avoid a birthday surprise collision with NF sequences, we require more than 64 bits of each sequence. Even with only 64 bits, however, a total of over 32 GB of memory would be required, a size far beyond the budget of this experiment. Thus, this test was also performed in multiple passes, with each pass generating all NF sequences but discarding those values not falling into the selected bin for the particular pass, as before. It was found empirically that the performance time was roughly proportional to the number of passes; in other words, the sort/compare time for a given pass was considerably shorter than the O(232) time required to generate the filtered list. For example, on a 200 MHz Pentium computer with slightly over 256MB of available RAM, 128 passes are required, which empirically were completed for a single S-box in slightly less than three days. Each sequence value in the list consisted of 64 bits, with additional sequence bits used to filter out values not to be used in a given pass. For a 128-pass test, this means that seven extra bits were used to help avoid a birthday surprise collision. The filtering code (in C) was carefully optimized so that most of the values to be filtered on each pass were quickly rejected. Since 127 of every 128 values were filtered out on each pass, this simple optimization sped up performance considerably, without requiring particularly fast generation of the 64-bit sequence values to be included.
Previous | Table of Contents | Next |